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jMost recent reviews on MYB (Box 1) transcription 
factors have focused on the structure and function of 
the vertebrate proteins 1-4 . Here we attempt to draw 
together what is known of the plant MYB transcription 
factor family, by analysing the structure of plant pro- 
teins relative to the prototypic vertebrate proteins, and 
by summarizing what is known of the function of MYB- 
related transcription factors in the growth and metab- 
olism of plants. We suggest tha"t the MYB family is very 
important in transcriptional control in higher plants 
because of the number of genes involved and because 
of their roles in the control of plant-specific processes. 

How similar are plant MYB proteins to their 
prototypic animal counterparts? 

The structural characteristic common to all known 
MYB proteins is the DNA-binding domain, the signature 
motif of transcription factor families. MYB proteins con- 
taining this domain have been shown to bind to DNA in 
a sequence-specific manner 5 - 6 . Additionally, many MYB 
proteins contain regions with the features of activator 
domains (generally of the negatively charged type), 
although for plant MYB proteins it is only in certain 
cases that there is evidence that these domains serve in 
transcriptional activation 7-9 . 

MYB proteins from animals generally contain three 
repeats (Rl, R2 and R3). The MYB DNA-binding 
domain of plant proteins usually consists of two imper- 
fect repeats of about 50 residues (R2, R3), although 
exceptionally it can contain only one of these repeats 9 
(B. Weisshaar and M. Feldbrugge, pers. commun.). 
Some MYB proteins identified from fungi also have just 
two repeats 10 - 11 . A comparison between the amino acid 
sequences of representative plant and mammalian MYB 
proteins reveals that there is greater conservation 
between the same repeat from different proteins than 
between the R2 and R3 repeats from the same protein 
(Fig. 1), in line with the idea that each repeat has a spe- 
cialized function. Structural analyses of the MYB DNA- 
binding domain have used c-MYB itself and its verte- 
brate homologue MYBL2 (also known as B-MYB) most 
extensively, but, given the high sequence similarity 
between MYB proteins (Fig. 1), molecular modelling 
predicts that the structural characteristics of all two- 
repeat plant MYB proteins are very similar to those of 
c-MYB (Refs 12, 13)- This presumed structural homology 
has been supported by DNA-binding studies of 
chimeras of plant MYB proteins and c-MYB, as well as 
through site-directed mutagenesis of one plant MYB 
protein, PhMYB3 (Ref. 13). 

Each c-MYB repeat folds into a variant of the 
helix-turn-helix motif, similar to that of the prokaryotic 
LexA protein, and contains three regularly spaced tryp- 
tophan residues (Fig. 1). These tryptophans play a role 
in the folding of the hydrophobic core of the MYB 
domain, and are generally conserved in all MYB pro- 
teins, although the first tryptophan residue in the R3 
repeat is substituted by another aromatic or hydropho- 
bic amino acid in most plant MYB proteins. The solu- 
tion structure of the complex between the minimal 
c-MYB DNA-binding domain (R2R3) and its target DNA 
has revealed a physical interaction between the two 
repeats. There is a partial overlap in DNA-binding 
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The cloning of the first transcription factor from plants, 
the CI gene of maize, indicated that plants use 
transcription factors that are structurally related to those 
of animals in their control of gene expression, because CI 
showed significant structural homology to the vertebrate 
cellular proto-oncogene c-MYB. Since 1987, the catalogue of 
MYB-related transcription factors bos increased 
considerably in size due, primarily, to the ever-expanding 
number of MYB genes identified in higher plants 
(Arabidopsis tbaUana is estimated to contain more than a 
hundred MYB genes). In vertebrates, the MYB-related 
proto-oncogenes comprise a small family with a central 
role in controlling cellular proliferation and commitment to 
development. However, while the functions of some plant 
MYB genes are relatively well understood they are, at 
present, quite distinct from their animal counterparts. 

between the repeats, with R2 and R3 positioned 
towards the 3' and 5' parts of the core motif bound by 
all vertebrate MYB proteins 14 . A probable consequence 
of this interaction is that it has imposed constraints on 
repeat co-evolution. In agreement with this, MYB chi- 
maeras prepared by combinations of R2 and R3 repeats 
from different protein sources generally have reduced 
binding affinity when compared with their progenitors. 

The DNA-binding specificity of plant MYB proteins 
differs considerably between themselves, as well as 
from that of the vertebrate MYB proteins 15 " 19 . For 
instance, the maize P protein recognizes the motif 
[C/A]TCC[T/A]ACC similar to that bound by AmMYB305 
from Antirrhinum, and neither of these proteins 
appears to bind to the similar vertebrate MYB consen- 
sus motif (TAACNG) (Refs 17, 18). PhMYB3 from Petu- 
nia binds to two sequences, MBSI (TAACIC/GJGTT) 
and MBSII (TAACTAAC) (Ref. 18). In the case of 
PhMYB3, it has been shown that a substitution of a 
single residue in the R2 recognition helix, Leu44 to Glu 
(for nomenclature of positions, see Fig. 1), switches the 
dual DNA-binding specificity to that of c-MYB, and the 
reciprocal substitution in c-MYB, Glu43 to Leu, gives 
dual DNA-binding specificity similar to PhMYB3 (Ref. 
13). In agreement with experimental data, molecular 
modelling predicts that the presence in PhMYB3 of Leu, 
instead of Glu, has two consequences: one direct, due 
to the change in base-contacting specificity of Leu ver- 
sus Glu (Ref. 20); and one indirect due to the inability 
of Leu, in contrast to Glu, to interact electrostatically 
with a base-contacting residue (Lys40), thereby facilitat- 
ing the interaction of Lys40 with either of two alterna- 
tive positions in the target DNA. 

Mutations in residues that do not contact bases also 
affect sequence-specific binding and might account for 
some of the differences in DNA-binding specificity 
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Box 1. Glossary of MYB names 





Standardized 




Name of MYB protein 


nomenclature 


Species 


c-MYB (Ref. 52) 


HsMYB 


Homo sabiens 


B-MYB/MYBL2 (Ref. 53) 


HsMYBL2 




A-MYB/MYBL1 (Ref. 53) 


HsMYBLl 




CI (Ref. 7) 


ZmMYBCl 


Zea mays 


PI (Ref. 17) 


ZmMYBPl 




Zml (Ref. 39) 


ZmMYBl 




Zm38 (Ref. 39) 


ZmMYB38 




GAMYB (Ref. 26) 


HvMYBGa 


Hordium vulgare 


Am305 (Ref 22) 


AmMYB305 


Antirrhinum mafus 


AmMO (Ref. 22) 


AmMYB340 


MIXTA (Ref. 24) 


AmMYBMx 




GL1 (Ref. 42) 


AtMYBGll 


Arabidopsis tbaliana 


AtMYBl (Ref. 46) 


AtMYBl 


AtMYB2 (Ref. 15) 


A1MYB2 




MYBPh3 (Ref. 18) 


PhMYB3 


Petunia bybrida 


AN2 (Ref. 40) 


PhMYBAn2 


MYBf (Ref. 54) 


DmMYB 


Drosopbila melanogaster 


MYB (Ref. 55) 


DdMYB 


Dictyostelium discoides 


CDC5 (Ref. 10) 


SpMYBCDS 


Scbizosaccbaromyces pombe 


FLBD (Ref. 11) 


AnMYBFD 


Aspergillus nidulans 



In this review we have used the original gene name when referring to MYB genes 
for which mutations are known. Where MYB genes have been isolated on the basis 
of sequence homology, we have used a standardized nomenclature giving first the 
initials of the species, then the term MYB, and then a term describing the particular 
family member derived from the original descriptions (see references). In the figures 
illustrating sequence similarities, we use this standardized nomenclature throughout 
so that MYB proteins from a single species can be readily identified. 



between plant MYB proteins 13 - 20 . Of the eight putative 
base-contacting residues in MYB proteins, six are fully 
conserved in all plant MYB proteins, and the remaining 
two are conserved in at least 80% of these proteins. In 
particular, the P protein shares all the putative base- 
contacting residues with c-MYB or PhMYB3 (Fig. 1), 
but shows a very different DNA-binding behaviour to 
these two proteins. Therefore, protein context has a 
significant effect on the specificity properties of base- 
contacting residues and the strength of their contacts 
and might also influence the DNA bending or distorting 
properties of the proteins 21 . In summary, although plant 
MYB proteins share the homologous MYB domain, 
differences in their base-contacting residues and in the 
overall context of their MYB domains produce distinct 
DNA-binding specificities in different members of the 
family. 

What governs MYB activity in plants? 

Gene activity can potentially be regulated at many 
different stages. Pretranslational control is evident from 
many differences in the organ-specific and temporal 
patterns of accumulation of RNA of different plant MYB 
genes 1 5- 22 - 2 5 and in response to environmental stimuli, 
such as light, salt stress or the plant hormones, gibberel- 
lic acid and abscisic acid 15 - 26 - 27 . 

Post-translational control can operate through differ- 
ent mechanisms, including cellular redox potential 28 , 
phosphorylation and protein-protein interactions. 
Thus, the in vitro DNA-binding capacity of two plant 
MYB proteins, AmMYB305 and PhMYB3, has been 
round to be sensitive to oxidation [R. Solano (1995) 



PhD Thesis, Univ. de Alcala], like 
c-MYB (Ref. 29), but redox control 
remains to be demonstrated in vivo 
in plants. 

Members of the plant MYB fam- 
ily contain several serine or threo- 
nine residues, especially in their 
C-terminal domains, which are 
possible substrates of kinases, sug- 
gesting that phosphorylation can 
affect the activity of some plant 
MYB proteins as it does for c-MYB 
(Refs 4, 22). Phosphorylation might 
influence DNA binding or tran- 
scriptional activation potential. 
However, experimental evidence 
for such control in plant MYB pro- 
teins is presendy lacking, except 
for AmMYB340, a protein that 
shows little DNA binding when 
synthesized in vivo, but that recov- 
ers high-affinity binding, equivalent 
to that of the protein synthesized in 
vitro, after treatment of extracts 
with alkaline phosphatase 29 . 

MYB proteins can potentially 
compete for common target motifs. 
This type of interaction has been 
suggested to occur in mammalian 
cells where transcriptional acti- 
vation by c-MYB (or its retroviral 
derivative, v-MYB) can be inhibited 
in some cells by the activity of MYBL2, which can bind 
to the same DNA sequences 50 - 31 . Such competition 
might also occur in plant cells, because MYB proteins 
with very similar DNA-binding domains can compete 
for a common target site; this has been shown for 
two flower-specific MYB proteins, AmMYB305 and 
AmMYB340 from Antirrhinum. The net activation of 
transcription of their target genes in any particular cell 
will be dependent on the relative amounts of the two 
proteins, their relative abilities to bind DNA and their 
differing abilities to activate transcription 29 . 

MYB proteins also interact with other transcriptional 
regulators. Such interactions are widespread for c-MYB, 
and some are believed to involve interactions with a 
negative regulatory domain in the C-terminus of the 
protein that contains a leucine zipper motif. No similar 
domain has yet been described in any plant MYB pro- 
tein. However, the CI protein of maize interacts, in an 
obligate fashion, with the R protein, or its homologues 
(B, SN and LC), to promote anthocyanin biosynthesis 
(see below). R, LC, SN and B contain a bHLH (basic- 
helix-Ioop-helix) motif characteristic of MYC transcrip- 
tion factors 32 . Experiments using the yeast two-hybrid 
system have shown that the CI protein interacts directly 
with the B protein, an interaction requiring the N- 
terminal part of CI (containing the MYB domain), and 
the N-terminal domain of B (which does not contain the 
bHLH domain). B derivatives lacking the bHLH motif 
still retain their competence to induce anthocyanin 
biosynthesis and can still bind CI, strongly suggesting 
that the major role of B is via its interaction with the CI 
protein 33 . 
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What do plant MYB transcription 
factors do? 

The three cellular members of 
the vertebrate MYB family, c-MYB, 
MYBL1 and MYBL2 recognize 
similar target motifs and all are 
believed to play roles in cellular 
proliferation and are expressed in 
actively dividing cells before differ- 
entiation, albeit in somewhat differ- 
ent tissues 34-36 . In plants there is 
good evidence for distinct functions 
for different MYB proteins; some 
controlling secondary metabolism, 
some regulating cellular morpho- 
genesis and some serving in the sig- 
nal transduction pathways respond- 
ing to plant growth regulators. 
Within these groups there are sub- 
groups of MYB proteins with over- 
lapping functions as seen with the 
vertebrate MYB proteins. 

Phenylpropanoid metabolism 

Phenylpropanoid metabolism is 
one of the three main types of 
secondary metabolism in plants in- 
volving modification of compounds 
derived initially from phenylalanine. 
Through one branch (flavonoid 
metabolism) it is responsible for the 
production of a major group of plant 
pigments (the anthocyanins) and 
other minor groups (aurones and 
phlobaphenes) and it also produces 
compounds that modify pigmen- 
tation through chemical interaction 
with the anthocyanins (co-pigmen- 
tauon), such as the flavones and 
flavonols. Flavones and flavonols 
also serve to absorb ultraviolet light 
to protect plants. Several flavonoids 
act as signalling molecules in 
legumes inducing gene expression 
in symbiotic bacteria in a species- 
specific manner, and others act as 
factors required for pollen matura- 
tion and pollen germination in some 
plant species. A number of flavo- 
noids and related phenyl propanoids 
(such as stilbenes) also act as defen- 
sive agents (phytoallexins) against 
biotic and abiotic stresses in particu- 
lar plant species. Another branch of 
phenylpropanoid metabolism pro- 
duces the precursors for production 
of lignin, the strengthening and 
waterproofing material of plant vas- 
cular tissue and one of the principal 
components of wood. This branch 
also produces other soluble pheno- 
lics, which can serve as signalling 
molecules, cell-wall crosslinking 
agents and antioxidants. 
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HSMYBL1 

HSMYBL2 

DdMYB 

DmMYB 

AnMYBFD 

SpMYBCD5 

AmMYB305 

AmMYBMx 

A1MYB1 

A1MYB2 

AtMYBGII 
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PhMYB3 

PhMYBAn2 
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ZmMYBCI 
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L I KGPWTKEEDQRV I ELVQK YGPKR .WSVIAKHLKG . 
LIKGPWTKEEDQRVIELVQKYGPKR .WSLIAKHLKG . 
LVKGPWTKEEDQKVI ELVKKYGTKQ . WTLIAKHLKG . 
LVKGAWTKDEDDKVI ELVKTYGPKK .WSDIALHLKG . 
LIKGPWTRDEDDMVIKLVRNPGPKK . WTL I ARYLNG . 
HRRGPWVPEEDQLLLQLVREQGPNNNWVRISQHMHY . 
L KGGAWKNTE DEI LKAA VS K YGKNQ .WARISSLLVR . 
VRKGPWTMEEDLILINYIANHGEGV . WNSLARSAGLK 
VKKGPWTVDEDQKLLAY I E EHGHGS .WRSLPLKAGLQ 
RVKGPWSKEEDDVLSELVKRLGARN .WSFIARSIPG . 
VRKGPWTEEEDAI LVNFVS I HGDAR .WNHIARSSGLK 
YKKGLWTVEEDNILMDYVLNHGTGQ .WNRIVRKTGLK 
L KKG PWT S AEDA I LVD YVK KHGEGN . WNAVQKNTG LF 
LKKGPWTAAEDSILMEYVKKHGEGN.WNAVKRNSGLM 
VRKGAWTE EEDLLLREC I DKYGEGK . WHLVPVRAGLN 
LNRGSWT PQEmRL I AY IQKHGHTN . WRAL PKQAGLL 
TNRGAWTKEEDERLVAY I RAHGEGC .WRSLPKAAGLL 
VKRGAWTS KEDDALAA YVKAHGEGK .WREVPQKAGLR 
LKRGRWTAEEDQLLANYIAEHGEGS .WRSLPKNAGLL 
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R I GKQCRERWHNHIiNP E 
RIGKQCRERWHNHLNPE 
RLGKQCRERWHNHLNPE 
RMGKQCRERWHNKLNPN 
RIGKQCRERWHNHLNNP 
RS PKQCKERYHQNLKPS 
KTPKQCKARWYEWI DPS 
RTGKSCRLRWLNYIiRPD 
RCGKSCRLRWANYIJtPD 
RSGKSCRLRWCNQLNPN 
RTGKSCRLRWLNYIiRPD 
RCGKSCRLRVWNYLS PN 
RCGKSCRLRWANHLRPN 
RCGKSCRLRWANHLRPN 
RCGKSCRLRWLNYLRPH 
RCGKSCRLRWINYLRPD 
RCGKSCRLRWINYLRPD 
RCGKSCRLRWLNYLRPN 
RCGKSCRLRWINYLRAD 



VKKTSWTEEEDRIIYQAHKRLG 
VKKS SWTEEEDRI I YEAHKRLG 
VKKS CWTEEEDRI I CEAHKVLG 
IKkEAWSDEEDQIIRDQHAIHG 
I KKTAWTEKEDEII YQAHLELG 
LNRDPISAJEEGLAIERMVNEWG 
I KJCTEWSREED.EKLLHLAKLLP 
VRRGN I TPEEQL LIMELHAKWG 
IKRGPFSLGSEQTIIQLfiALLG 
LIRNSFTEVEDQAIIAAHAIHG 
VRRGNITLEEQFMILKLHSbWG 
V^GNFTE^EDLIIRLHKLLG 
LKKGAFT P EEERLI IQLHSKMG 
LKKGAFTVEEERII I ELHAKLG 
IKRGDF.SLp2VDLILRLHKf-LG 
LKRGNFTDEEEEAJ IRLHGLLG 
LKRGNFTAIEDDLIVKLHSLLG 
IRRGNI S YDSEDLI IRLHRLLG 
VKRGNISKEEEDII IKLHATLG 
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NRWAE IAKL LP . . GRTDNAIKNHWNSTMRRK 
NRWAEIAKLLP . . GRTDNSIKNHWNSTMRRK 
NRWAEIAKMLP. . GRTDNAVKNHWNSTIKRK 

■ NKWAEIAKFLP. . GRTDNA I KKHWN S S MKRV 
NQWAKIAKRLP.. .GRTDNAIKNHWNSTMRRK 
RCWAE IARRLG . . NRSDNAVKNWWNGNMNRK 
TQflRTIAPIAPIVGRTATQCLERYQKLLDDL 
NRWSKIAKTLP - . GRTDNEIKNYWRTRIQKH 
NRWSAIASKLP. . KRTDNEIKNYWNTHLKKR 
NKWAVIAKLLP. . GRTDNAIKNHWNSALRRR 
NRWSKIAQYLP . . GRTDNEIKNYWRTRVQKQ 
NRWSLIAKRVP. . GRTDNQVKNYWNTHLSKK 
NKWARMAAHLP. . GRTDNEIKNYWNTRIKRC 

■ NKWARMAAQLP. . GRTDHEIKNYWNTRLKRR 
NRWSL IAGRLP . . GRTANDVKNYWNTHLRKK 

. NKWSK.IAACLP . . GRTDNEIKNVWNTHLKKK 
NKWSLIAARLP. . GRTDKEIKNYWNTHVRRK 
NRWSL IAGRLP . . GRTDNEIKNYWNSTLGRR 
KRWSLIASHLP. . GRTDHEIKNYWNSHLSRQ 
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Figure 1. The DNA-binding domain of 
plant MYB proteins: sequence and 
structure, (a) Sequence alignment of the 
R2 and R3 repeats of representative MYB 
proteins from plants, fungi and animals 
using the CLUSTAIV program**. To 
unify nomenclature ail MYB proteins 
have been renamed to add a two letter 
prefix as a species identifier (Box 1). 
Residues conserved in all plant MYB 
proteins are highlighted in white letters. 
The three regularly spaced tryptophan 
residues present in each repeat of 
animal MYB proteins and known to be 
important in maintaining the 
hydrophobic core of the DNA-binding domain are labelled with asterisks. The 
positions corresponding to base-contacting residues of the murine hbrriologue of 
HsMYB arc marked with arrowheads (K39, E43, N47, N90, K93, N94, N97 and S98; 
where position 1 is the first residue of the R2 repeat) and the size reflects the 
strength or the contact. The three helices in each repeat 1214 are shown in the lower 
part of the figure. The filled box corresponds to the recognition helix, (b) Structure 
of the MYB-DNA complex. Ribbon plot of the minimal DNA-binding domain of the 
murine homologue of HsMYB containing the R2 and R3 repeats, binding to its 
target DNA. The structure represented corresponds to the average of 25 NMR 
solutions 14 . Molecular modelling 1 * predicts similar structures for the different plant 
MYB proteins represented in Fig. 1(a). The region in the recognition helix of each 
repeat corresponding to base-contacting residues is highlighted in red. 
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Figure 2. Summary of current understanding of the roles of 
MYB-related transcription factors in controlling phenylpropanoid 
metabolism from phenyl alanine. MYB-related proteins known to 
control expression of the subsets of genes encoding enzymes 
involved in these steps are shown at die end of each pathway. 
Names with question marks refer to proteins whose role has been 
demonstrated only biochemically. The functions of MYB proteins 
without question marks have been demonstrated genetically. 
P has been shown to control steps 4, 5 and 7, and CI to control 
steps 4, 7, 8, 9 and a glutathione-S-transferase (which is involved 
in transportation of anthocyanins to the vacuole). AN2 controls 
steps 7, 8 and 10. AmMYB505 and AmMYBMO have been shown 
to activate steps 1, 5 and 6, PhMYB3 has been shown to activate 
one gene encoding CHS (step 4) in Petunia (CHSJ), and 
ZmMYBI has been shown to activate step 7. 



MYB proteins are known to play an important role 
in the control of phenylpropanoid metabolism. The CI 
protein activates transcription of genes encoding 
enzymes involved in the biosynthesis of the antho- 
cyanin pigments in the outer layer of cells of the maize 
seed endosperm (the aleurone) 7 ' 37 ' 38 . Activation has 
been demonstrated for five genes in the pathway to 
anthocyanin (Fig. 2), although CI probably activates 
expression of all the genes required specifically for 
anthocyanin biosynthesis in the aleurone. Activation by 
CI involves a partner transcriptional activator in aleu- 
rone encoded by the R gene 32 . While CI is active in 
aleurone, a very similar MYB protein, PL, is functional 
in controlling anthocyanin biosynthesis in the maize 
plant (including leaves, stems, and so on) where it 
interacts with other members of the R-protein family to 
activate anthocyanin biosynthetic gene expression 25 . 

In maize, another MYB protein, ZmMYBI can acti- 
vate one of the structural genes required for antho- 
cyanin biosynthesis, but not the entire pathway^, while 
yet another, ZmMYB38, inhibits CI -mediated activation 



of the same promoter. It appears that MYB proteins are 
used to give independent regulation of the structural 
genes to produce different end products in different 
cells. 

Reiteration of MYB-gene function to give metabolic 
diversity occurs in the control of a branch of flavonoid 
metabolism producing the red phlobaphene pigments 
from intermediates in flavonoid metabolism. This path- 
way is under control of the P gene in maize, which 
encodes a MYB-related protein 17 . The P gene product 
activates a subset of the genes involved in anthocyanin 
biosynthesis (Fig. 2). The P-binding site is contained 
within the promoters of these target genes 19 , and the P 
gene product does not interact with the R-family pro- 
teins and might be able to activate transcription of its 
target genes alone. So, in maize, at least two. different 
MYB proteins serve to direct flavonoid metabolism 
along different routes by selective activation of target 
genes. 

In other plant species MYB proteins serve similar 
roles in the control of phenylpropanoid metabolism as, 
for example, in Petunia flowers where the AN 2 gene 
product is required for anthocyanin production and has 
recently been shown to encode a MYB-related prod- 
uct 40 IF. Quattrocchio (1994) PhD Thesis, Vrije Univ. of 
Amsterdam] but, unlike CI in maize, AN2 is not required 
for expression of genes encoding the early steps in 
anthocyanin biosynthesis (Fig. 2; Ref. 40); perhaps 
another MYB transcription factor might serve to regulate 
these early steps. One gene encoding chalcone synthase 
(CHSJ) can be activated by another MYB protein from 
Petunia, PhMYB3, which is expressed specifically in 
petal epidermis where anthocyanin pigment is made 18 . 
Interestingly, there is also strong evidence that AN2 inter- 
acts with a bHLH protein to activate transcription 40 , 
although PhMYB3 (unlike CI) can activate transcription 
alone, suggesting that it does not have an obligate 
requirement for interaction with a bHLH partner 18 . 

MYB proteins can also ^serve to regulate other 
branches of phenylpropanoid metabolism. In Antir- 
rhinum majus and tobacco AmMYB305 (or its ortho- 
Iogue in tobacco) can activate the gene encoding the 
first enzyme of phenylpropanoid metabolism, phenyl- 
alanine ammonia lyase (PAL; Ref. 15). However, the 
evidence for regulation of other branches of phenyl- 
propanoid metabolism by MYB proteins is, at present, 
circumstantial and rests on the significance of 
sequences related to MYB-binding sites in many of the 
promoters of structural genes in phenylpropanoid 
metabolism, and lignin biosynthesis 16 . However, some 
MYB genes have been shown to be highly expressed in 
tissues such as differentiating xylem, supporting 
the view that they serve roles in controlling the branch 
of phenylpropanoid metabolism involved in lignin 
production 41 . 

Cell shape 

The second well-established role for plant MYB 
genes is in the control of cell shape where the MIXTA 
gene of Antirrhinum and the orthologous PhMYEl 
gene from Petunia have been shown to be essential for 
developing the conical form of petal epidermal cells 
(Fig. 3) and the Gil gene of Arabidopsis has been 
shown to be essential for the differentiation of hair cells 
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(trichomes) in some pans of the leaf and in the stem 2 1 42 
[L. Mur (1995) PhD Thesis, Vrije Univ. of Amsterdam], 
These two roles might be mechanistically similar 
because overexpression of MIXTA in transgenic tobacco 
results in trichome formation on petals, suggesting that 
conical petal cells might be trichoblasts" arrested at an 
early stage in trichome formation (B. Glover and 
C. Martin, unpublished). 

GL1 is required for an initial expansion in the size of 
the cell that develops into the trichome, and it acts 
upstream of a number of other genes 43 , mutation of 
which gives rise to cellular outgrowths that do not 
develop into full, branched trichomes. One. GL2, 
encodes a homeodomain protein that is probably a 
transcriptional activator of subsequent stages in tri- 
chome development 44 . It is, therefore, possible that GL1 
is a direct activator of the GL2 gene. Supporting this 
idea, the GL2 gene promoter contains motifs very simi- 
lar to the binding sites of P and AmMYB305 transcrip- 
tion factors, and these lie in a region shown to affect 
GL2 function quantitatively, presumably through affect- 
ing the level of Gil expression 44 . The conical cells pro- 
duced by the action of the MIXTA gene of Antirrhinum 
resemble the limited outgrowths produced in Arabid- 
opsis gl2 mutants where trichome formation is aborted. 
Perhaps the initial stages of trichome formation regu- 
lated by GL1 are similar to those regulated by MIXTA. 
The developmental programme giving rise to conical 
cells in petals might terminate before the onset of that 
part of the programme regulated by GL2 in trichome 
development. In its specification of trichome formation, 
GL1 is either controlled by or interacts with the product 
of the TTG gene, which is required for trichpme for- 
mation and anthocyanin production. Overexpression of 
the maize tfgene complements the ttg mutation leading 
to the suggestion that the TTG gene product is also a 
R-related protein that interacts with GL1 in a manner 
analogous to the interaction of CI and R in maize 4 \ 

Two MYB proteins from fungi, the cX>C5gene prod- 
uct from Schizosaccharomyces pombe 10 and the FLBD 
gene product from Aspergillus nidulans 11 can also con- 
trol aspects of cell shape. The FLBD gene product is 
required for early conidiophore production in 
Aspergillus colonies. The initial branching of the fungal 
mycelium might have mechanistic similarities to tri- 
chome formation, and FLBD is thought to activate a cas- 
cade of transcription factors for conidiophore produc- 
tion. Clearly, assessment of these similarities in the 
cellular mode of action of such diverse MYB proteins 
requires understanding of the specific biochemical 
processes they activate. 

Response to hormones 

A more-recently defined role for plant MYB proteins 
is in hormonal responses during seed development and 
germination. A barley MYB protein (C.AMYB) whose 
expression is induced by gibberellic acid (GA) has been 
shown to activate expression of a gene enccxling a high 
pi a-amylase that is synthesized in barley aleurone 
upon germination for the mobilization of starch in the 
endosperm^. Expression of GAMYB is induced by 
treatment of aleurone layers with GA and expression of 
the a-amylase gene is induced subsequently. There is a 
suggestion that other GA-inducible genes can also 
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Figure 3- Plant phenotypes clue to MYB gene mutations, 
(a) Phenotype of unstable allele of the CJ locus of maize. A 
transposable element inactivates the gene so the production of 
.anthocyanin pigment in the aleurone layer of the kernel is 
suppressed. Somatic excision gives rise to pigmented revertant 
.sectors. Photograph courtesy of Brian Sclieffler. (b) Phenotype of 
unstable allele of MIXTA locus of Antirrhinum maps. Pigmented 
pelal cells are viewed under the microscope. The transposable 
clement suppresses MIXTA expression to give paler cells and 
somalic reversion gives sectors of full colour. The effect of MIXTA 
works not through control of" pigment production, however, but 
through the modification of the optical properties of the petal 
epidermal cells, (c) The effect of MIXTA on cell shape. The 
MIXTA gene controls the formation of conical cells on the petal. 
Transposon insertion gives rise to flat cells with sectors of 
revenant conical cells clue to somatic excision as revealed by 
scanning electron microscopy. The conical cells appear more 
darkly pigmented than the flat ceils due to their optimized 
reflection and refraction properties. 

respond to activation by MYB proteins during seed ger- 
mination because MYB-like motifs from other GA- 
responsive gene promoters have been shown to direct 
reporter gene expression in response to GA (Ref. 25). 
There is, as yet. no strong evidence for MYB protein 
involvement in other GA-induced processes in other 
parts of the plant although some MYB genes are 
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Figure 4. Dendrogram of relationships among MYB proteins 
derived from the matrix of sequence similarities calculated with 
the CLUSTALV program 51 . Nomenclature for MYB proteins as 

in Fig. 1. 

expressed in response to GA treatment of Petunia 
petals [L. Mur (1995) PhD Thesis, Vrije Univ. of 
Amsterdam]. 

Treatment with another plant hormone, abscisic 
acid (ABA), induces expression of AtMYBI in Arabid- 
opsis, a MYB gene that is also induced in response to 
dehydration or salt stress 46 . In maize, expression of the 
CI gene is ABA-responsive, where it is involved in the 
formation of anthocyanin in the developing kernels 27 . 
AtMYB2 might be responsible for activating expression 
of some drought-responsive genes because binding by 
AtMYB2 to the promoter region of a drought- or salt- 
stress-induced gene, rd22, has been demonstrated. The 
rd22 gene promoter also contains MYC-recognition 
sequences suggesting that AtMYB2 can interact with a 
bHLH protein to induce gene transcription in response 
to dehydration or salt stress 47 . 

Cellular proliferation 

None of the known functions of MYB proteins in 
plants bear much similarity to the biological roles of the 
c-MYB family in vertebrates. Part of the role of c-MYB 
in promoting cellular proliferation concerns the control 
of progression of the cell cycle from Gl to S phase 
through the regulation of CDC2 kinase. c-MYB has 
been shown to activate transcription from the CDC2 
kinase gene promoter in animal ceils and so to control 
the Gl-S phase transition, a role that might go part of 
the way to explain its promotional effects on cellular 
proliferation 48 . No such role has yet been demonstrated 
for a plant MYB gene product although the CDC2a 
gene from Arabidopsis has been shown to contain 
MYB recognition motifs within its promoter. These 
sequences lie in regions that enhance the level of 
expression driven by the promoter as demonstrated by 
reporter gene fusions 4 *. Perhaps there are MYB genes 
in plants with functions more closely analogous to their 



animal counterparts. There is certainly a subclass of 
MYB proteins (including AtMYBI from Arabidopsis, 
Fig. 4) that bear greater structural similarity to vertebrate 
MYB proteins than to the other plant MYB proteins, and 
this structural similarity could reflect homologous cellu- 
lar functions. 

Why are there so many plant MYB proteins? 

The pervasiveness of AfyB-related genes in all major 
groups of eukaryotic organisms suggests that proteins 
with MYB-like DNA-binding domains developed early 
to regulate gene expression. Different types of MYB 
protein might then have evolved as a result of dupli- 
cation or triplication of the basic repeat unit. It has been 
proposed that evolution has occurred mostly through 
modification of regulation of common structural genes, 
and the separation between different groups of eukary- 
otes might be accompanied by the differential use of 
the transcriptional factor classes. This does not, in itself, 
explain why plants have made such extensive use of 
MYB proteins, and it might well be that MYB genes 
have been duplicated and their functions expanded in 
conjunction with the development of novel functions in 
higher plants. 

Although fungi and bryophytes contain MYBs with 
two repeats, suggesting that R2R3-type proteins were 
early forms of MYB available for controlling gene 
expression, it is the size of the R2R3-type MYB gene 
family in higher plants that is particularly remarkable. In 
one lower plant, the moss Physcmytrella patens, the 
MYB protein family has, in fact, been estimated to be 
small, with only two to three gene members 50 . MYB 
gene function might have diversified in parallel to 
increasing complexity in developmental and metabolic 
pathways as, for example in phenylpropanoid metab- 
olism and also in transcriptional responses to hor- 
mones, such as gibberellic acid and abscisic acid, which 
are, themselves, specialized plant signalling molecules 
generated from secondary- metabolites. So, plants 
appear to have used R2R3-type MYB transcription 
factors selectively to control their specialized 
physiological functions, while in contrast, vertebrates 
have developed only one small group of MYB proteins 
to control cellular proliferation and differentiation. 
However, only about 10% of the plant MYB genes have 
some attributed function, so the full extent of the 
participation of this transcription factor gene family in 
plant growth and development is only just becoming 
realized, as new and diverse functions are defined for 
its members. 
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Arabidopsis Transcription 
Factors: Genome-Wide 
Comparative Analysis Among 
Eukaryotes 

J. L Riechmann * J. Heard, G. Martin, L Reuber, C-Z. Jiang, 
J. Keddie, L. Adam, O. Pineda, O. J. Ratcliffe, R. R. Samaha, 
R. Creelman, M. Pilgrim, P. Broun, J. Z. Zhang, D. Ghandehari, 
B. K. Sherman, G.-L Yu 

The completion of the Arabidopsis thatiana genome sequence allows a com- 
parative analysis of transcriptional regulators across the three eukaryotic king- 
doms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 
transcription factors, about 45% of which are from families specific to plants. 
Arabidopsis transcription factors that belong to families common to all eu- 
karyotes do not share significant similarity with those of the other kingdoms 
beyond the conserved DN A binding domains, many of which have been arranged 
in combinations specific to each lineage. The genome- wide comparison reveals 
the evolutionary generation of diversity in the regulation of transcription. 



Regulation of gene expression at the level of 
transcription influences or controls many of 
the biological processes in a cell or organism, 
such as progression through the cell cycle, 
metabolic and physiological balance, and re- 
sponses to the environment. Development is 
based on the cellular capacity for differential 
gene expression and is often controlled by 
transcription factors acting as switches of 
regulatory cascades (/). In addition, alter- 
ations in the expression of genes coding for 
transcriptional regulators are emerging as a 
major source of the diversity and change that 
underlie evolution (2). 

With the completion of the Arabidopsis 
thaiiana genome sequence, the entire com- 
plement of genes coding for transcription fac- 
tors from a plant can be identified and de- 
scribed. Together with the three other eukary- 
otic genomes that have already been se- 
quenced, it also allows investigation of the 
similarities and differences in transcriptional 
regulators among the three eukaryotic king- 
doms: plants, animals (Caenorhabditis el- 
egans and Drosophila melanogaster) (3, 4), 
and fungi (Saccharomyces cerevisiae) (5). 
We present such a description and analysis 
here. 

Gene Content and Organization 

To characterize the entire complement of 
transcription factors encoded by the genomes 
of Arabidopsis, Drosophila, C. elegans, and 
S. cerevisiae, we used a comprehensive list of 
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proteins, domains, and motifs to query the 
corresponding sequence databases. Tran- 
scription factors are usually defined as pro- 
teins that show sequence-specific DNA bind- 
ing and are capable of activating and/or re- 
pressing transcription. Although most of the 
proteins and protein families that were con- 
sidered in our study fit these criteria, we have 
also included some other types of transcrip- 
tional regulators. Most known transcription 
factors can be grouped into families accord- 
ing to their DNA binding domain (<5). Protein 
domains that are sometimes present in tran- 
scription factors, but not necessarily associ- 
ated with them, have not been included in this 
genome survey, for example, some zinc co- 
ordinating motifs that either are involved in 
protein-protein interactions or have not yet 
been functionally characterized. 

We searched the Drosophila, C. elegans, 
and yeast encoded protein complements { pro- 
teomes) using BLAST and motif-finding pro- 
grams (7). Because the complete predicted 
proteome of Arabidopsis was not available at 
the time of the analysis, we used the entire set 
of genomic sequences (7). 

The Arabidopsis genome codes for at least 
1533 transcriptional regulators, which ac- 
count for -5.9% of its estimated total num- 
ber of genes (Table 1). We identified 635, 
669, and 209 transcriptional regulators in the 
proteomes of Drosophila, C. elegans, and 
yeast, respectively (4.5, 3.5, and 3.5%). Thus, 
the Arabidopsis content of transcription fac- 
tors is 1.3 times that of Drosophila and 1.7 
times that of C elegans and yeast. These 
results represent an underestimate of the total 
number of transcription factors in these or- 
ganisms. Approximately 40 to 50% of the 



proteins encoded by each of those genomes 
cannot be assigned to functional categories 
on the basis of sequence similarity to proteins 
of known function (3, 8-11). Some of those 
uncharacterized proteins are expected to be 
transcriptional regulators (12. 13). The large 
number and diversity of transcription factors 
in Drosophila were proposed to be related to 
its substantial regulatory complexity (4). Ap- 
plying the same logic to Arabidopsis suggests 
that the regulation of transcription in plants is 
as complex as that in Drosophila. In contrast 
to Drosophila and C. elegans, for which a 
sizable (>25%) fraction of their known tran- 
scription factors have been characterized ge- 
netically (14), only —5% of those from Ara- 
bidopsis have been defined by mutation anal- 
ysis (15). 

Arabidopsis contains many tandem gene 
duplications and large-scale duplications on 
different chromosomes, which might account 
for >60% of the genome (9. 10, 16). Where- 
as some of these duplications have been fol- 
lowed by rearrangements and divergent evo- 
lution, up to 40% of the Arabidopsis genes 
might comprise pairs of highly related se- 
quences (16). In that respect, Arabidopsis is 
similar to the three other eukaryotic organ- 
isms. The S. cerevisiae genome is the result 
of a complete ancient genome duplication 
that was followed by extensive gene rear- 
rangements and deletions (17). In yeast, 
—30% of the genes form duplicate gene pairs. 
Similarly, duplicated genes account for —48 
and -40% of the total gene content of C. 
elegans and Drosophila, respectively (//). 

AN of the Arabidopsis transcription factor 
gene families are scattered throughout the 
genome. On average, closely related genes 
account for —45% of the total number in the 
major families (Table 2) (18). Gene duplica- 
tions on different chromosomes are most 
common (—65%), but duplicated genes are 
also frequently found at large distances in the 
same chromosome (—22%) as well as orga- 
nized in tandem repeats (-13%) (19). Clus- 
ters of three or more highly related genes are 
very rare (Table 2). 

Transcription Factors Across the 
Eukaryotic Kingdoms 

Two features stand out when comparing the 
Arabidopsis complement of transcriptional 
regulators with that of the other organisms 
(Table 3). First, <22% of the Arabidopsis 
transcription factors are zinc-coordinating 
proteins [belonging to several different fam- 
ilies that are thought to have evolved inde- 
pendently (20)]. In contrast, zinc-coordinat- 
ing proteins constitute most of the transcrip- 
tion factors in the three other eukaryotes; 
—51% in Drosophila, —64% in C. elegans, 
and 56% in yeast. Second, in Arabidopsis, 
there is no single family of transcription fac- 
tors that has been so disproportionately am- 
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plified as the nuclear hormone receptors in C. 
eiegans (—38% of its transcription factors), 
the C2H2 zinc finger proteins in Drosophila 
C— 46%), or the C6 and C2H2 families in 
yeast (—25% each one). The three largest 
families of transcription factors in Arabidop- 
sis, AP2/EREBP (APETALA2/ethylene re- 
sponsive element binding protein), MYB- 
(R1)R2R3. and bHLH (basic helix-loop- 
helix), each represent only -9% of the total, 
and there are several other families with com- 
parable numbers of genes. 

Each eukaryotic lineage has its own set of 
particular transcription factor families and 
genes [comparing such a small number of 
genomes represents a limitation for this type 
of analysis (21)} (Table 3). The lineage-spe- 
cific families are of interest from an evolu- 
tionary point of view. According to molecu- 
lar phylogenetic analyses, plants, animals, 
and fungi all diverged from a common ances- 
tor during a short period of time, — 1 .5 billion 
years ago (15). Thus, it would be expected 



that most of the transcription factor families 
would either be shared by the three lineages, 
if they were present in the common ancestor, 
or specific to each lineage, if they arose 
independently following divergence. This is 
indeed the case (Table 3). Members of lin- 
eage-specific families represent 45% of the 
Arabidopsis transcription factors, 47% in C. 
eiegans, and 32% in yeast (but only 14% in 
Drosophila, because of its extensive use of 
the C2H2 zinc finger proteins). Families that 
are present in all four organisms account for 
most of the remaining transcription factors in 
each case. 

There are, however, a few exceptions to 
this expected pattern: some genes and gene 
families are present in two of the three lin- 
eages. Transcription factors and transcription 
factor families that are present in Drosophila, 
C. eiegans, and yeast (but are absent from 
Arabidopsis) include the SOX/TCF (SRY- 
related HMG box/T cell factor) group, the 
fork head -type/winged- helix proteins, and 



homologs of the human transcription factor 
RFX1 (Table 3). The SOX/TCF group, 
which includes developmental regulators like 
human SRY (sex-determining region Y) and 
TCF and the yeast hypoxic-gene regulator 
ROX1, forms part of the HMG-box (high- 
mobility group) superfamily of proteins (22). 
In contrast to other HMG-box proteins that 
act as architectural components of chromatin 
and have no sequence specificity on their 
own, the SOX/TCF factors show sequence- 
specific DNA binding and transact i vat ion ac- 
tivities. There are 14 genes in the Arabidopsis 
genome encoding HMG box-containing pro- 
teins, but phylogenetic analyses indicate that 
none of these proteins belong to the SOX/ 
TCF group (15). 

In contrast to the examples described 
above, there does not appear to be any case of 
transcriptional regulators that are present in 
both yeast and Arabidopsis but absent from 
animals. This distribution of genes and gene 
families in the three eukaryotic lineages is in 
agreement with the notion that animals and 
fungi are more closely related to each other 
than to plants (23). There are at least three 
classes of transcription factors that are 
present in plants and animals but absent from 
yeast: TUBBY-like (TUB), CPP-like (cys- 
tein-rich polycomb-like protein), and E2F/DP 
proteins (13, 24. 25) (Table 3). It remains to 
be determined whether these classes of genes 
were specifically lost from the S. cerevisiae 
genome or if they are really absent from the 
fungal lineage. 

There are many transcription factor fami- 
lies that are found only in plants, some of 
which have been greatly amplified. These 
include the AP2/EREBP (26), NAC (27), 
and WRK.Y families (28); the trihelix DNA 
binding proteins (29); the auxin response fac- 
tors (ARFs); the Aux/IAA proteins [which do 
not bind to DNA by themselves, but interact 
with the ARF proteins (30)]; and other small- 
er families (Table 3). Similarly, animals and 
yeast have many families of transcription fac- 
tors that are not found in plants (Table 3). 

A lingering question when considering 
protein families that appear to be exclusive to 
one lineage is whether their signature do- 
mains are true evolutionary innovations or 
whether their relationships with other pro- 
teins have been blurred because their amino 
acid sequences {but not their three-dimen- 
sional structures) have diverged substantially 
overtime. Some of the plant-specific families 
of transcriptional regulators are characterized 
by domains that appear to be genuine novel- 
ties. For example, the AP2 domain exhibits a 
new mode of DNA recognition by a [3-sheet 
structure (31). Other transcription factors 
classified as specific to plants, however, 
might be related to proteins found in other 
organisms. The plant-specific GRAS proteins 
might be distant relatives of the animal-spe- 



Table 1. Content of transcriptional regulator genes in eukaryotic genomes. The number of genes in each 
of the eukaryotic genomes is given as an approximate number. This is because the number of genes 
predicted at the time that a genome is sequenced is always an estimate that is refined over time (7). 



Genes coding for transcriptional 
regulators 



Organism 


Total number of 






genes 


Total 


Percentage of 
total number 






number 


of genes 


A. thaliana 


-26,000 


1533 


5.9 


S. cerevisiae 


-6,000 


209 


3.5 


C eiegans 


-19,000 


669 


3.S 


D. metanogaster 


-14.000 


635 


4.5 



Table 2. Gene duplications in Arabidopsis transcription factor families. The major families of Arabidopsis 
transcription factors were analyzed for the presence of pairs or groups of highly related genes (78). The 
families analyzed together comprise over 1000 genes. Tandem duplications are arbitrarily defined as 
those that occur within a sequence distance of 50 kb. If two genes are duplicated in the same 
chromosome but reside >50 kb apart from each other, they are counted in the "Duplications in the same 
chromosome" column. (Zn) indicates a zinc coordinating DNA binding motif. 



Gene family 


Percentage 
of genes 

with close 
homolog 


Tandem 
duplications 
(%) 


Duplications 

in same 
chromosome 
(%) 


Duplications in 

different 
chromosomes 
(%) 


Number of 
gene clusters/ 
number of 
genes in 
cluster 
(chromosome) 


MYB-(R1)R2R3 


44 


7 


28 


65 


0 


AP2/EREBP 


45 


11 


39 


50 


1/3(4) 


bHLH 


42 
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74 
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NAC 
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63 


1/5(1) 
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1/3(3) 
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24 


71 


0 


MADS 


50 


30 


32 


38 


1/4(5) 


bZIP 
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13 


22 


65 


1/3(5) 


WRKY (Zn) 


33 


12 


17 


71 


1/3(1) 


GARP 


48 


0 


8 


92 


0 


Dof (Zn) 


37 


33 


17 


50 


1/4 (4) 


CO-like (Zn) 


52 


13 


13 


74 


0 


GATA (Zn) 


SO 


0 


0 


100 


0 


Total 


44 


13 


22 


65 


NA 
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cific STATS, based on a similar arrangement 
of related functional domains ( 32). The trihe- 
lix DNA-binding domain, present only in 
plants, might have evolved from the MYB 
domain, found in all eukaryotes (29). 

The two transcription factor families that 
have been more substantially amplified in 
Arabidopsis, as compared to animals and 
yeast, are the MYB and the MADS families. 
The MYB motif consists of a helix- turn-helix 
structure with three regularly spaced Trp res- 
idues. In Arabidopsis, almost all of the MYB 
proteins belong to the MYB-R2R3 class (131 
members): they contain two imperfect repeats 
of the MYB motif (33). MYB-R1R2R3 pro- 
teins, which are the norm in animals, are rare 
in Arabidopsis (five proteins). The plant-spe- 
cific R2R3 organization is thought to have 



evolved from an RlR2R3-type ancestral gene 
from which the first repeat was lost (34). 
Because the plant MYB-R1R2R3 proteins are 
more closely related to the animal MYB pro- 
teins than to the plant proteins of the R2R3 
type, it has been suggested that they might 
have functions related to those of the MYB 
proteins in animals, such as the control of cell 
proliferation (34. 35). Conversely, MYB- 
R2R3 proteins might have evolved to regu- 
late processes specific to plants, including 
secondary metabolism, responses to plant 
hormones, and the identity of specific cell 
types. 

In addition to the MYB-{R1)R2R3 pro- 
teins, Arabidopsis contains additional tran- 
scription factors characterized by a more di- 
vergent MYB domain, which is present either 



as a single copy or as a repeat. These proteins 
form a heterogeneous group and are often 
referred to as "MYB related." For the purpose 
of clarity, we have divided the Arabidopsis 
MYB-related proteins into several subclasses 
in Fig. 1 (15). 

More distant but also related to the MYB 
superfamily is a previously unidentified 
group of proteins that we propose to name 
"GARP," after maize GOLDEN2, the ARR 
B -class proteins from Arabidopsis, and 
Chlamydomonas Psrl (36-39) (Fig. 1). 
These proteins appear to be involved in plant- 
specific processes: GOLDEN2 controls the 
differentiation of a photosynthetic cell type of 
the maize leaf, whereas Psrl is a regulator of 
phosphorus metabolism. 

Arabidopsis also contains many more heat 



Table 3. Eukaryotic transcriptional regulators. Number of transcriptional reg- 
ulators in Arabidopsis [A.t), Drosophila (O.m.J, C. elegam (Ce.), and S. 
cerevisiae (S.c), classified by families on the basis of sequence similarity. The 
table is nonredundant: proteins are counted only once, regardless of whether 
they have more than one signature motif. The way in which proteins combine 
different DNA binding motifs were organized into families is reflected in Fig. 1. 
Families that are specific to one lineage are indicated in color. Families are 
listed under "Transcription factors" or "Other transcriptional regulators," as 
described in the text. However, this distinction is not without problems (for 
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example, the ARID and HMG-box families). Information about the signature 
motif(s) or sequences that define each family is provided as an InterPro (IPR) 
or CenBank accession number (56). (Zn) indicates a zinc coordinating DNA 
binding motif, tn the bHLH class, only proteins with a discernible basic region 
were included. "Other" includes some single-copy genes and small families 
that are not individually mentioned in the text. The results of the database 
searches (P, motif searches; B, BLAST} and sequence comparisons were 
inspected by eye. The numbers reported here might therefore differ from 
other large-scale classifications that are performed automatically (7 7). 
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shock transcription factors (HSFs) than does 
Drosophiia, C. elegans, or yeast. Plant HSFs 
exhibit structural and functional characteris- 
tics specific to that lineage (40, 41). 

For those transcription factor families that 
are common to all eukaryotes, how similar 
are the Arabidopsis proteins to those from the 
other organisms? Each Arabidopsis transcrip- 
tion factor was compared to the proteomes of 
Drosophiia, C. elegans, and yeast by using 
the BLASTX and BLASTP programs. The 
analysis revealed that Arabidopsis transcrip- 
tion factors do not share significant similarity 
with those from the other lineages, except in 
the conserved DNA binding domains that 
define the respective families. The only Ara- 
bidopsis proteins that showed similarity be- 
yond the threshold of significance established 



in the comparison {42) were some homologs 
of the HAP3 subunit of the CCAAT-box 
binding factor and a MYB-related protein 
known to be homologous to the S. cerevisiae 
CEFl and S. pombe Cdc5 proteins (43, 44). 

Domain Shuffling 

The modular nature of transcription factors 
and the importance of domain shuffling in 
protein evolution are both well established. 
The characterization of the entire comple- 
ment of Arabidopsis transcription factors al- 
lows consideration of the extent of domain 
accretion, shuffling, and divergence in these 
proteins and reveals the relationships among 
the different families at a genome-wide scale 
(Fig. 1). 

Shuffling of some of the DNA binding 
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Fig. 1. Relationships and domain shuffling among the different Arabidopsis transcription factor 
families. Gene families are represented by circles, whose size is proportional to the number of 
members in the family. Domains that have been shuffled and that therefore "connect" different 
groups of transcription factors are indicated with rectangles, whose size is proportional to the 
length of the domain. DNA binding domains are colored; other domains (usually protein-protein 
interaction domains) are shown with hatched patterns. Dashed lines indicate that a given domain 
is a characteristic of the family or subfamily to which it is connected. Gene names are written in 
italics. Whereas many of the indicated domain-shuffling events are specific to plants, others likely 
predate the appearance of the three distinct eukaryotic lineages. For an expanded version of this 
figure and the information that was used to construct it, see supplemental material (75). 



domains that are present in all eukaryotes has 
generated novel transcription factors with 
plant-specific combinations of modules. This 
is well illustrated by the homeodomain pro- 
teins. In —50% of the members of the Aru- 
bidopsis homeobox family, the homeodo- 
main is followed by a leucine zipper (Fig. 1 ). 
This combination of motifs is not observed in 
the yeast or animal homeodomain proteins. 
The only Arabidopsis homeodomain proteins 
that have an additional motif also found in 
animal homeodomain proteins are those of 
the KNOX class, which contain a ME1NOX 
domain (Fig. I) (45). On the other hand, 
homeodomains in animals are associated with 
a large variety of motifs, such as the paired 
and POU-specific domains, the LIM motif, or 
C2H2 zinc fingers, in combinations that are 
not present in Arabidopsis. Some of these 
domains (paired and POU) are specific to 
animals. 

Other examples of plant-specific arrange- 
ments of common domains include the 
MADS, YABBY, and ARID families. The 
ARID (for AT-rich interaction domain) motif 
is found in animals in a variety of develop- 
mental and cell-cycle regulators, like the 
Drosophiia Dead ringer and Osa proteins 
(46). In animal ARID proteins, that domain is 
combined with other motifs, like PHD fingers 
or the jumonji domain (47). In the Arabidop- 
sis ARID proteins, the ARID domain is as- 
sociated with an HMG box, whereas PHD 
fingers and the jumonji domain form other 
combinations (Fig. 1). Some animal ARID 
proteins, like Bright, exhibit sequence-specif- 
ic DNA binding, whereas others, like Osa, do 
not. Osa. however, modulates the activity of 
the SWI/SNF Brahma complex to promote 
the activation of specific target genes (46). 

MADS domain proteins in plants were 
first identified as regulators of floral organ 
identity and have since been found to control 
additional developmental processes, such as 
meristem identity, root development, fruit de- 
hiscence, and flowering time (48, 49). A 
characteristic of the plant MADS domain 
proteins that sets them apart from their ani- 
mal and fungal counterparts is a modular 
organization containing a distinct coiled-coil 
domain (K box). The Arabidopsis genome 
sequence, however, has revealed that there is 
an additional class of plant MADS domain 
proteins in which the K box is absent (50). 
Phylogenetic analyses indicate that a gene 
duplication event, ancestral to the divergence 
of plants and animals, generated two MADS- 
box gene lineages that are now present in all 
eukaryotes. In plants, one lineage resulted in 
MADS proteins with a K box, whereas the 
other resulted in proteins that lack it (50). 
This conclusion, which was based on se- 
quence phylogeny. is also supported by the 
structure of the genes. K box-containing 
MADS-box genes have multiple exons, the 
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MADS "box being completely encompassed 
in one of them. However, analysis of the 
Arabidopsis genomic sequence indicates that 
MADS-box genes lacking a K. box have a 
simpler structure, with fewer or no introns. 
Drosophila and C. elegans each have two 
MADS-box genes, one per lineage. In Arabi- 
dopsis, in which at least 82 MADS-box genes 
can be identified, both classes have been 
substantially amplified (Fig. 1). 

It has been proposed that the complexity 
in protein domain organization increases with 
the complexity of the organism (U). The 
above examples of domain shuffling and ac- 
cretion suggest that, at least among transcrip- 
tion factors, plants are as complex as animals 
in this respect. 

Together with the lineage-specific gener- 
ation of novel classes of transcription factors 
or the specific amplification and divergence 
in one lineage of a common type of regula- 
tor, development of novel functions might 
also result from the organization of transcrip- 
tion factors in novel networks of protein- 
protein interactions, perhaps as a conse- 
quence of domain-shuffling events. For ex- 
ample, the animal-specific Smad proteins de- 
pend on interactions with other transcription 
factors to compensate for their relatively low 
DNA binding sequence specificity (57). 
These factors include the vertebrate winged- 
helix protein Fast-1 (winged-helix proteins 
are found in animals and in fungi) and the 
Xenopus homeodomain proteins Mixer and 
Milk. The Srnad-Mixer/Milk interaction has 
been proposed to mediate mesoendodermal 
induction (52), All of these Smad- interacting 
proteins of different classes (Fasti, and Mix- 
er and Milk) share a short Smad-interaction 
motif (52) that appears to be specific to ver- 
tebrates: it is not found in Drosophila, C. 
elegans, Arabidopsis, or yeast proteins. More 
examples of this kind will be uncovered as 
the networks of protein-protein interactions 
among transcription factors are deciphered. 

Functional Diversity 

The differences in transcription factor con- 
tent, sequence, and structure among the three 
eukaryotic lineages are also accompanied by 
functional diversity. Equivalent or similar bi- 
ological functions can be controlled by dif- 
ferent families of transcription factors in each 
lineage. Conversely, DNA binding domains 
that are found in all three eukaryotic king- 
doms often control different functions in each 
one. Developmental regulators illustrate this 
point. There are also cases, however, in 
which the involvement of a gene or family in 
a particular biological function has been 
maintained across the three lineages (for ex- 
ample, the HSF family). 

Pattern formation is an obligate require- 
ment in the development of complex multi- 
cellular organisms. In animals, determination 



of regional identity and specification of the 
body plan are achieved through the localized 
activities of homeodomain proteins. Similar 
functions in plants, meristem patterning and 
floral organ identity determination, rely on 
the domain-specific expression of a subset of 
MADS-box genes (48, 49). Therefore, two 
different transcription factor families have 
been used for similar developmental func- 
tions in the two lineages. 

Patterning depends on a system of axes. 
The dorsoventral polarity of Drosophila has 
been likened to the dorsoventral asymmetry 
of zygomorphic flowers and could also be 
conceptualized as being similar to the adaxi- 
al-abaxial polarity of the plant lateral organs. 
In all of these cases, polarity is established 
through the regionally localized expression or 
accumulation of transcription factors, but 
those belong to different classes. Floral 
asymmetry in Antirrhinum is dependent on 
the activities of CYC and DICH, two mem- 
bers of the plant-specific family of transcrip- 
tion factors TCP (53, 54). Transcription fac- 
tors of another plant-specific family, YABBY, 
are involved in establishing the adaxial-abaxial 
polarity of the plant lateral organs, together 
with other genes like PHAN, a MYB-related 
protein (55). In Drosophila, embryonic dorso- 
ventral polarity is established through a gradient 
of Dorsal, a transcription factor of the NF-kB/ 
Rel/Dorsal group (NF-kB, nuclear factor kB). 
NF-KB/Rel/Dorsal proteins are found in Dro- 
sophila and mammals but not in C. elegans, 
yeast, or plants. 

Conclusion 

Each eukaryotic lineage has invented a siz- 
able fraction of its own transcriptional regu- 
lators. DNA binding domains that are con- 
served in sequence and structure have been 
rearranged in different ways to create novel 
proteins. The degree of domain shuffling 
among transcription factors is large. In many 
instances, families that are common to the 
three kingdoms have been used for different 
or novel processes in each of the lineages. 
The picture that emerges from the compari- 
son of the entire complement of transcription 
factors of Arabidopsis, Drosophila, C. ele- 
gans, and S. cerevisiae is one of diversity. 
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rion, 494 probe sets, representing 453 genes 
or 6% of the genes on the chip, were classi- 
fied as cycling (Web table l) (S); 28% of 
these genes have not been characterized, and 
no conclusions can be drawn about their 
function. More than 20 of the known genes 
we found to be clock-regulated have been 
previously reported to be under circadian 
control (3. 10), validating our experimental 
methods. 

We placed the cycling genes into phase 
clusters of peak expression time. All six pos- 
sible phases (given our 4-hour time resolu- 
tion) were well represented, although there 
were fewer genes peaking at CT 16 (II) than 
in other phases [Web table 1 and Web fig. 2 
(8)]. This is in contrast to cyanobacteria, in 
which 80% of circadian-regulated genes peak 
near subjective dusk (12). Many of the genes 
we found to cycle can be clustered into func- 
tional groups on the basis of their known and 
predicted physiological roles. 

Clock-controlled anticipation of dawn 
and dusk. A large cluster of genes implicated 
in the light- harvesting reactions of photosyn- 
thesis were found to be under clock control. 
mRNAs encoding four LHCA and seven 
LHCB proteins, chlorophyll binding proteins 
that funnel light energy to the reaction centers 
of photosystems 1 and II, were cycling (Fig. 
1 A). Also, mRNA encoding an enzyme (pro- 
toporphyrin IX magnesium chelatase) in- 
volved in the synthesis of their ligand, chlo- 
rophyll, was cycling (Web table I) (8). Seven 
photosystem I reaction center genes and three 
photosystem 11 reaction center genes were 
likewise cycling (Fig. IB). These 22 photo- 
synthesis genes exhibit striking coregulation, 
with most peaking around midday at CT4 (9). 
Two LHC genes, the reaction center gene 
PSADI, and the magnesium chelatase gene 
have been previously reported to cycle (10, 
13). 

Light also regulates growth and develop- 
ment and resets the circadian clock. Genes 
encoding phytochrome B (PHYB), crypto- 
chrome 1 {CRYI), cryptochrome 2 (CRY2), 
and phototropin (NPH1) (Web fig. 3A) (8) 
were clock-regulated, Homologs of the blue 
light photoreceptor genes CRY1 and CRY2 
are also clock-controlled in animals (14). 
Downstream mediators of phototransduction 
pathways, SPA1 and RPT2, were also clock- 
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Circadian rhythms control processes ranging 
from human sleep-wake cycles to cyanobac- 
terial cell division. This is made possible by 
the circadian clock, an internal biochemical 
oscillator. The circadian clock allows organ- 
isms to anticipate daily changes in the envi- 
ronment such as the onset of dawn and dusk, 
providing them with an adaptive advantage 
(/). Physiological processes regulated by the 
clock in higher plants include photoperiodic 
induction of flowering (2) and rhythmic hy- 
pocotyl elongation, cotyledon movement, and 
stomatal opening (J). A small number of 
genes regulated by the clock have been found 
in an essentially serendipitous fashion (4, 5). 
However, a global examination of genes con- 
trolled by the clock in plants, or in any eu- 
karyote, has been lacking. 
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The circadian clock regulates hundreds 
of genes. We have used highly reproducible 
oligonucleotide-based arrays (6) to determine 
steady-state mRNA levels in Arabidopsis at 
4-hour intervals during the subjective day and 
night. We examined temporal patterns of 
gene expression in Arabidopsis plants under 
constant light conditions using GeneChip ar- 
rays representing about 8200 different genes. 
We hybridized duplicate microarrays with 
biotin-labeled probes derived from plant tis- 
sues harvested every 4 hours over 2 days (7). 
Reproducibility between arrays was excellent 
(Web fig. I) (8). The mean hybridization 
signal strength and the standard error of 
the mean for each probe set at each time 
point were calculated from the duplicate 
hybridizations. 

To objectively determine which genes ex- 
hibited a circadian pattern of expression, we 
empirically tested for statistically significant 
cross-correlation between the temporal ex- 
pression profiles of each probe set and cosine 
waves of defined period and phase. Genes 
with a greater than 95% probable correlation 
with a cosine test wave with a period between 
20 and 28 hours were scored as circadian- 
regulated (9). This analysis is independent of 
signal strength and imposes no minimal 
change in amplitude. According to this crite- 
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